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Abstract 

We propose steganographic systems for the case when covertexts 
(containers) are generated by an i.i.d. or a finite-memory distribu- 
tion, with known or unknown statistics. The probabihty distributions 
of covertexts with and without hidden information are the same; this 
means that the proposed stegosystems are perfectly secure, i.e. an 
observer cannot determine whether hidden information is being trans- 
mitted. In contrast, existing resuhs only include methods for which 
the distributions of covertexts with and without hidden text are close 
but not equal. 

The speed of transmission of hidden information can be made ar- 
bitrary close to the theoretical limit — the Shannon entropy of the 
source of covertexts. All the proposed algorithms are polynomial-time 
in all arguments. An interesting feature of our stegosystems is that 
they do not require any (secret or public) key. In other words, shared 
secret is not required to obtain perfect steganographic security. 

Keywords: Steganography, Information Hiding, Information Theory, 

Shannon entropy. 

1 Introduction 

The goal of steganography can be described as follows. Alice and Bob can 
exchange messages of a certain kind (called covertexts) over a public channel. 
The covertexts can be, for example, a sequence of photographic images, 
videos, text emails and so on. Alice wants to pass some secret information 
to Bob so that Eve, the observer, cannot notice that any hidden information 
is being passed. Thus, Alice should use the covertexts to hide the secret 
text. It may be assumed that Alice and Bob share a secret key. A classical 
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illustration from [16j states the problem in terms of communication in a 
prison: Alice and Bob are prisoners who want to concoct an escape plan 
passing each other messages which can be read by a ward. 

Perhaps the first information-theoretic approach to steganography was 
taken by Cachin [T]. In this work the sequence of covertext is modelled by 
a memoryless finite-alphabet distribution. Besides laying out basic defini- 
tions of steganographic protocols and their security, Cachin has constructed 
a steganographic protocol, which, relying on the fact that the probability 
distribution of covertexts is known, assures that the distributions of cover- 
texts with and without hidden information are statistically close (but, in 
general, are not equal). For the case of an unknown distribution, a uni- 
versal steganographic system was proposed, in which this property holds 
only asymptotically with the size of the hidden message going to infinity. 
Distribution-free stegosystems are of particular practical importance, since 
in reality covertexts can be a sequence of graphical images, instant or email 
messages, that is, sources for which the distribution is not only unknown 
but perhaps cannot be reasonably approximated. Cachin has also defined 
perfectly secure steganographic systems as those for which the probabil- 
ity distribution of covertexts with and without hidden information are the 
same. It is worth noting that, since C. Shannon's celebrated paper "Commu- 
nication theory of secrecy systems" |15| , the information-theoretic approach 
was efficiently applied to many problems of secrecy systems, see e.g. [8] and 
references therein. 

We follow the information-theoretic approach to steganography of [T] 
and propose constructions of perfectly secure stegosystems. This means 
that in these stegosystems covertexts with and without hidden information 
are statistically indistinguishable. In other words. Eve does not get any in- 
formation on whether hidden text is being transmitted within the covertext 
sequence. This is an improvement over the universal stegosystem of Cachin 
in that we replace e-security in asymptotic by perfect security for any mes- 
sage length. Moreover, we relax theoretical assumptions of [I] by allowing 
the source of covertexts to have an infinite alphabet and to have a finite 
memory. 

For any stegosystem the next property after its security that is of interest 
is its capacity. The capacity of a stegosystem can be defined as the number 
of hidden bits transmitted per letter of covertext. For the case when the 
covertexts are drawn from a finite alphabet we show that our stegosystems 
have maximal possible capacity: the number of hidden bits per covertext 
approaches (with the length of the block growing) the Shannon entropy of 
the source of covertexts. On the other hand, if the size of the alphabet of 
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the covertext source and its minentropy tend to infinity then, in the case 
of a memoryless source of covertexts, the number of bits of hidden text per 
letter of covertext tends to log(n!)/n where n is the (fixed) size of blocks 
used for hidden text encoding. 

Another feature of our stegosystems is that they do not require a se- 
cret key. Thus, the constructions presented demonstrate that in order to 
achieve perfect steganographic security no secret has to be shared between 
the communicating parties. Clearly, in this case Eve (the observer) can re- 
trieve the secret message being transmitted; however, she will not be able 
to say whether it is a secret message or a random noise. This property of 
our stegosystems (as indeed their secrecy) relies heavily on the fact that 
the secret message transmitted is indistinguishable from a Bernoulli i.i.d. 
sequence of bits (random noise) . This is a standard assumption that can be 
easily fulfilled if Alice uses the Vernam cipher (a one-time pad) to encode 
the secret before transmitting. For this, obviously, a secret key is required. 
In other words, a secret key can be used to obtain cryptographic security, 
but it is not required to obtain steganographic security, as long as the hidden 
information is already indistinguishable from random noise. This also means 
that the proposed stegosystems can be directly applied for covert open-key 
cryptographic communication. 

The main idea behind the stegosystems we propose is the following. Sup- 
pose that for a sequence of covertexts generated by a source, we can find 
a set of covertexts that had the same probability of being generated as the 
given one. Moreover, assume that each of these other sequences defines this 
set uniquely. Then instead of transmitting the sequence actually generated, 
we can transmit one of the sequences in the set, whose number corresponds 
to the secret text we want to pass. This does not change the probabilis- 
tic characteristics of the source, provided the hidden text consists of i.i.d. 
equiprobable bits. Therefore, an observer cannot tell whether secret infor- 
mation is being passed. Consider a simple example. Suppose that Alice 
wants to pass a single bit, and assume that the source of covertexts is i.i.d., 
but its distribution is unknown. Alice reads two symbols from the source, 
say ab. She knows that (since the source is i.i.d.) the probability of ba was 
the same. So if Alice's secret bit to pass is she transmits ab and if she 
needs to pass 1 then she transmits ba. However, if the source has gener- 
ated aa then Alice cannot pass the secret bit, but she has to transmit aa 
anyway, to preserve the probabilistic characteristics of the source. (This 
example is considered in more details in Section [3j) The same idea was used 
by von Neumann [10] in his method of extracting random equiprobable bits 
from a source of i.i.d. (but not necessarily equiprobable) symbols. Von 
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Neumann's method is not optimal in the sense that it does not extract all 
randomness from the source; in particular, pairs of the type aa are not used. 
The steganographic method that we have outlined has the same disadvan- 
tage, namely the rate of transmission of hidden symbols is far from maximal. 
To construct a stegosystem that has a better rate of transmission, one can 
use the same ideas that were used to generalize von Neumann's randomness 
extractor. One way is to iterate von Neumann's procedure (e.g. use pairs of 
pairs aa...bb in the same way as pairs were used). This idea is due to Peres 
pT] : it can also be used to construct a stegosystem, but we do not consider it 
here. The stegosystems we propose are constructed based on the ideas sim- 
ilar to those used by Elias [4] for generalizing von Neumann's randomness 
extractor. The idea is as follows. For a sequence of symbols of length n out- 
put by an i.i.d. source (with unknown characteristics), all its permutations 
have the same probability. To pass secret information, Alice transmits the 
permuted sequence whose number (in the set of all permutations) encodes 
her message. A stegosystem based on this principle achieves (asymptotically 
with the block length n growing) maximal possible rate of transmission of 
hidden text: the Shannon entropy of the source of covertexts. Moreover, 
an advantage of this idea is that it can be used beyond i.i.d. sources of 
covertexts. In particular, we propose a stegosystem for /c-order Markovian 
sources (for any given A;), also by considering sequences whose probability 
is the same as that of the given one. 

The fact that such a stegosystem has maximal possible rate of transmis- 
sion is based on results of the so-called Theory of Types [2j, widely used 
in Information Theory. Two sequences are said to be of the same type if 
the have the same probability of being output (by a given source). The 
result that is of importance for our construction is the relation of the size 
of the set of sequences of the same type as a given one to their empirical 
entropy, and thus to the entropy of the source. It is interesting to observe 
that a distribution-free stegosystem of Cachin [1] is also based on ideas of 
the Theory of Types (but this stegosystem is not perfectly secure). 

The rest of the paper is organized as follows. In the next section we 
present the basic definitions. Section [3] provides an example of a simple 
perfectly secure stegosystem. This stegosystem does not have the maximal 
capacity, but it demonstrates the ideas used in the stegosystems that we 
present in the subsequent sections. In Section d] we present a (perfectly se- 
cure) stegosystem for the case of a memory less source of covertexts, which 
has the mentioned asymptotic properties of the rates of hidden text trans- 
mission, and in Section [5] we briefly describe how it can be algorithmically 
realized in practice. Section [6] contains an extension of the stegosystem of 
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Section U] to the case when the source of covertexts has a (finite) memory. 
Finally, Section [7] contains a discussion. 

2 Notations and definitions 

We use the following model for steganography, mainly following [1]. It is 
assumed that Alice has an access to an oracle which generates covertexts 
according to some fixed but unknown distribution of covertexts //. Cover- 
texts belong to some (possibly infinite) alphabet A. Alice wants to use this 
source for transmitting hidden messages. It is assumed that Alice does not 
know the distribution of covertexts generated by the oracle, but this distri- 
bution is either memoryless or has a finite memory; moreover, a bound on 
the memory of the source of covertexts is known to all the parties (and is 
used in the stegosystems as a parameter). 

A hidden message is a sequence of letters from B = {0, 1} generated 
independently with equal probabilities of and 1. We denote the source 
of hidden messages by co. This is a commonly used model for the source 
of secret messages, since it is assumed that secret messages are encrypted 
by Alice using a key shared only with Bob. If Alice uses the Vernam ci- 
pher (a one-time pad) then the encrypted messages are indeed generated 
according to the Bernoulli 1/2 distribution, whereas if Alice uses modern 
block or stream ciphers the encrypted sequence "looks like" a sequence of 
random Bernoulli 1/2 trials. (Here "looks like" means indistinguishable in 
polynomial time, or that the likeness is confirmed experimentally by statis- 
tical data, see, e.g. [SldS].) The third party. Eve is a passive adversary: Eve 
is reading all messages passed from Alice to Bob and is trying to determine 
whether secret messages are being passed in the covertexts or not. Clearly, 
if covertexts with and without hidden information have the same probability 
distribution (//) then it is impossible to distinguish them. Finite groups of 
(covertext, hidden, secret) letters are sometimes called (covertext, hidden, 
secret) words. Elements of A (B) are usually denoted by x (y). 

The steganographic protocol can be summarized in the following defini- 
tion. 

Definition 1 (steganographic protocol). Alice draws a sequence of cover- 
texts X* = xi,X2,--- generated by a source of covertexts fi, where Xi, 
i € N belong to some (finite or infinite) alphabet A. 

Alice has a sequence y* = yi, ?/2i • • • o/ secret text generated by a source 
uj i.i.d. equiprobable bits yi: oj{yi = 0) = ijo{yi = 1) = 1/2, independently 
for all z £ N. Alice also has access to a private random sequence A = 
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Si,d2,--- of i.i.d. equiprobable bits. The sources n, uj, and A are assumed 
independent. 

A stegosystem St is a pair of functions: the encoder, that maps A'^ x 
{0, 1}°^ X {0, 1}°° (a block of covertexts, a secret sequence and randomness) 
to A^, where n eN is an optional parameter (the block length), whose value 
is known to all parties. The decoder is a function from to {0, 1}*. 

I'jFrom, X* , y* and A Alice using a stegosystem St obtains a stegano- 
graphic sequence X = Xi, X2, . . . that is transmitted over a public chan- 
nel to Bob. 

Bob (and any possible Eve) receives X and obtains using the decoder 
St~^{X) the resulting sequence y*. 

For convenience of notation, the definition is presented in terms of an in- 
finite sequence of secret text (and random source) . It means tliat a stegosys- 
tem can use as many or as few bits of the hidden text for transmission in 
a given block as is needed. In practice, of course, Ahce has only a finite 
sequence to pass, which may result in that she will run out of secret bits 
when transmitting the last block of covertexts. In this case we assume that 
the end of each message can always be determined (e.g. there is always an 
encrypted "end of message" sign in the end), so that Alice can fill up the 
remainder by random noise. 

Observe that we require by definition of a steganographic system that the 
decoding is always correct. Moreover, we do not consider noisy channels or 
active adversaries, so that Bob always receives what Alice has transmitted. 

Note also that there is no secret key in the protocol. A secret key may 
or may not be used before entering into steganographic communication in 
order to obtain the hidden sequence x*; however, this is out of scope of the 
protocol. 

Definition 2 (perfect security). A steganogrpahic system is called (per- 
fectly) secure if the sequence of covertexts x* and the steganographic sequence 
X have the same distribution: Pr{x\, . . . ,Xn & C) = Pr{Xi, . . . , Xn € C) 
for any (measurable) C C A" and any n G N, where the probability is taken 
with respect to all distributions involved: ijl, lo and A. 

3 Example: a simple perfectly secure stegosystem 

Next we present a simple stegosystem that demonstrates the main ideas 
used in the general stegosystem, which we develop in the next section. The 
stegosystem described in this section does not use randomization. 
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Consider a situation in which not only the secret letters are drawn (using 
io) from a binary alphabet, but also the source of covertexts /i generates 
i.i.d. symbols from the alphabet A = {a, b} (not necessarily with equal 
probabilities). Suppose that Alice has to transmit the sequence y* = yiy2 ■ ■ ■ 
generated according to uj and let there be given a covertext sequence x* = 
X1X2 ■ ■ ■ generated by fi. For example, let 

y* = 01100 . . . , X* = aababaaaabbaaaaabb .... (1) 

The sequences x* and y* are encoded in a new sequence X (to be transmitted 
to Bob) such that y* is uniquely determined by X and the distribution of 
X is the same as the distribution of x* (that is, fi; in other words, X and 
x* are statistically indistinguishable). 

The encoding is carried out in two steps. First let us group all symbols 
of X* into pairs, and denote 

aa = n, bb = u, ab = vq, ba = vi. 

In our example, the sequence ([1]) is represented as 

X* = aa ba ba aa ab baaaaabb - ■ ■ = uviViuvqViuuu . . . 

Then X is acquired from x* as follows: all pairs corresponding to u are 
left unchanged, while all pairs corresponding to Vk are transformed to pairs 
corresponding to Vy-^ Vy^ Vy^ . . m our example 

X = aa abba aaba ab aa aabb.... 

Decoding is obvious: Bob groups the symbols of X into pairs, ignores all 
occurrences of aa and bb and changes ab to and 6a to 1. 

The properties of the described stegosystem, which we call 5^2; are sum- 
marized in the following (nearly obvious) statement. 

Proposition 1. Suppose that a source fi generates i.i.d. random variables 
taking values in A = {a, b} and let this source be used for encoding secret 
messages consisting of a sequence of i.i.d. equiprobable binary symbols using 
the method 5**2 • Then the sequence of symbols output by the stegosystem 
obeys the same distribution fi as the input sequence. 

The proof of this statement is simple, and we omit it since in the next 
section a stegosystem is presented that has stronger properties. It is also 
easy to see that the same method can be used when the alphabet A is not 
binary. Indeed, everything that we need to construct St2{A) is that there is 
some (partial) ordering on the set A. Then Alice can use each consecutive 
pair 0102 such that either ai < 02 or 02 < ai to transmit one bit of the 
secret text. 
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4 General construction of a universal stegosystem 
for i.i.d. sources 



In this section we consider the general construction of universal stegosystem 
which has the desired asymptotic properties. As before, Alice needs to 
transmit a sequence y* = yiy2 ■ ■ ■ of secret binary messages drawn by an 
i.i.d. source uj with equal probabilities of and 1, and let there be given 
a sequence of covertexts x* = xiX2 ■ ■ ■ drawn i.i.d. by a source fj, from an 
alphabet A. First we break the sequence x* into blocks of n symbols each, 
where n > 1 is a parameter. Each block will be used to transmit several 
symbols from y* (for example, in the previously constructed stegosystem 

(A) each block of length 2 was used to transmit 1 or symbols) . However, 
in the general case a problem arises which was not present in the construction 
of St2{A). Namely, we have to align the lengths of the blocks of symbols 
from X* and from y*, and for this we will need randomization. The problem 
is that the probabilities of blocks from y* are divisible by powers of 2, which 
is not necessarily the case with blocks from x*. 

We now present a formal description. Let u denote the first n symbols of 
and let t'u(a) be the number of occurrences of the symbol 
a in u. Define the set Su as consisting of all words of length n in which the 
frequency of each letter a G ^ is the same as in u: 

Su = {v G A"- -.Va G A f„(a) = ^'«(a)}. 

Observe that the ^u-probabilities of all members of Su are equal. Let there 
be given some ordering on the set Su (for example, lexicographical) which is 
known to both Alice and Bob (and to anyone else) and let Su = {sq, si, . . . 
with this ordering. 
Denote m = [log2|S'M|J, where \y\ stands for the largest integer not 
greater than y. Consider the binary expansion of l^^l: 

\Su\ = {am,am-i,---,Oio), (2) 
where = 1, aj G {0,1} , m > j > 0. In other words, 

\Su\ = 2"* + a^_i2"*-i + a„,-22"^-' + ... + ao- 

Define a random variable A as taking each value i G {0, 1, . . . ,m} with 
probability ai2^/\Su\ '■ 

p{A = i) = ai2y\Su\. (3) 

Alice, having read u, generates a value of the random variable A, say d, and 
then reads d symbols from y*. Consider the word r* represented by these 
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symbols as an integer which we denote by r. Then we encode the word r* 
(that is, d bits of y*) by the word s-r from the set Su, where 

m 

r= Yl «i2' + r. (4) 

l=d+l 

(In other words, the word Sr is being output by the coder.) 

Then Ahcc reads the next n-bit word, and so on. Denote the constructed 
stegosystcm by S't^j(^). 

To decode the received sequence Bob breaks it into blocks of length n 
and repeats all the steps in the reversed order: by the current word u he 
obtains Su and r. To obtain d Bob finds the largest number d' G {0, . . . , m} 
such that a^' 7^ and r < a/2'. Then he precedes to finding r and 

r*; that is, he finds |r*| next symbols of the secret sequence y*. 

Consider an example which illustrates all the steps of the calculation. Let 
A = {a,b,c}, n = S, u = bac. Then Su = {abc,acb,bac,bca,cab,cba}, \Su\ = 
6, m = 2, 02 = l,ai = l,ao = 0. Let the sequence of secret messages be 
y* = 0110. . . . Suppose the value of A generated by Alice is 1. Then she 
reads one symbol of y* (in this case 0) and calculates r = 0, r* = 0, r = 
2^ + = 4 and finds the codeblock 54 = cab. To decode the message. Bob 
from the block cab calculates r = 4, r = 0, r* = and finds the next symbol 
of the secret sequence — 0. 

It is clear from the construction that encoding and decoding is unambigu- 
ous and the decoding is always correct. The following theorem establishes 
perfect security of the obtained stegosystem, and its rate of transmission of 
hidden text. 

Theorem 1. Suppose that an unknown source ji generates i.i.d. random 
variables taking values in some alphabet A. Let this source be used for en- 
coding secret messages consisting of a sequence of i.i.d. equiprobable binary 
symbols using the described method St'^{A) with n > 1. Then 

(i) St^^{A) is perfectly secure: the sequence of symbols output by the 
stegosystem obeys the same distribution /j, as the input sequence, 

(a) the average number of secret symbols per covertext (Ln) satisfies the 
following inequality 




where fi{u) is the ^-probability of the word u and fu(o) is the number 
of occurrences of the letter a in the word u. 



9 



Proof. For for any covertext word u we have fJ-{u) = l/\Su\- Therefore, 
to prove the first statement it is sufficient to show that the probabihty of 
occurrence of each u in the output sequence is also l/jS'ul. Consider the 
first block to be transmitted. Let yi,y2 • • • be the hidden text to transmit, 
and let be the integer represented by yi, . . . , y^, that is = Ylk=d Vk'^^ ■ 
Using the notation Su = {sr : r = 1, . . . ,rn-} we find that for each r the 
probability of Sr in the transmitted steganographic sequence is 

m 

P(r) = ^P(t|A = A:)P(A 

fc=0 

m m 
k=0 l=k+l 

where the first and the last equalities are trivial, the second follows from the 
definition of A ([3|), the third from ^ and the fourth from the fact that 
are independent and equiprobable random bits. 

The second statement can be obtained by direct calculation of the av- 
erage number of symbols from y* encoded by one block. Indeed, from ([3]) 
we find that for each covertext word u the expected number of transmitted 
symbols L„ is 

k=l ' ' k=l 

Having taken into account the identity ^'^=Qk/2^ = 2 and the definition 
m = [log 1 5u I J we get Ln > m — 2 > log — 3. It remains to notice that 
|5'mI = rf — / M for each word u. □ 

Let us now consider the asymptotic behaviour of Ln when n — > oo. The 
following statement establishes (in asymptotic) the maximum possible rate 
of transmission of hidden text of the stegosystem Stn{A). 

Corollary 1. // the alphabet A is finite then the average number of hidden 
symbols per letter Ln goes to the Shannon entropy h{fi) of the source ^ as the 
block length n goes to infinity; here by definition h{fj,) = — fJ'ia) log ^(a). 

Proof. The statement follows from ([5]) and the law of large numbers. Indeed, 
it is a well-known fact of Information Theory (e.g. [5j) that as n ^ oo with 
probability 1 we have log \Su\/n h[^). □ 



k) = ^P(t|A = A;) 



fc=0 



' k=Q 



■ ak2' 



19 r 



10 



In many real stegosystems the alphabet A is huge (it can consist, for 
example, of all possible digital photographs of given file format, or of all 
possible e-mail messages). In such a case it is interesting to consider the 
asymptotic behaviour of L„ with fixed n when the alphabet size |^| goes to 
infinity. For this we need to define the so-called min-entropy of the source fi: 

Hooin) = min{- log ;u(a)} . (6) 

Corollary 2. Assume the conditions of Theorem{l\ and fix the block length 
n > 1. If \A\ oo and Hoo{fJ-) oo then Ln tends to [log{n\) — 0{l))/n. 

Proof. ^From H^{^) ^ cxd if follows that maxagyi/i(a) 0. Therefore the 
probability that all letters in a block are different goes to 1, so that the 
bound in ([5]) approaches {log{n\) — 3)/n. □ 



5 Complexity of encoding and decoding 

Consider the resource complexity of the stegosystem St^{A). The only al- 
gorithmically non-trivial part of this stegosystem is in finding the rank of a 
given block u in the set Su of all its permutations, and, vice versa, finding a 
block given its rank. (It is clear that all other operations can be performed 
in linear time.) 

Consider this computational problem in some detail. To store all possible 
words from the set Su would require memory of order |A'|"n log |j4'| bits, 
(where A' C A is the set of all symbols that occur in u and n = \u\; without 
loss of generality in the sequel we assume A = A'), which is practically 
unacceptable for large n. However, there are algorithms for solving this 
problem with polynomial resource complexity. The first such algorithm, 
that uses polynomial memory with the time of calculation cn^,c > 0, per 
letter, was proposed in [7] (see also [3]). The time of calculation of the 
fastest known algorithm is O(log^n), see |12j . 

Next we briefly present the ideas behind the algorithm from [7J. Assume 
the alphabet A is binary. Let S be a set of n— length binary words with w I's. 
The main observation is the following formula which gives a lexicographical 
number of any word v = xi . . . Xn € S : 

rank{xi . . . Xn) = ( ^^k^i ) , (7) 

k=i ^ ^ 2^j=i / 

where ^ ^ ~ t\/{m\(t — m)!), 0! = 1, ^ ^ ^ = 0, if t < m. The proof 
of this well-known equality can be found, for example, in [6l IT2]. As an 
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example, for n = A,w = 2,v = 1010 we have 



ranfc(lOlO) = 




The computation by ([7]) can be performed step by step based on the 
following obvious identities: 




A direct estimation of the number of multiplications and divisions gives a 
polynomial time of calculations by ([7]). The method of finding the word v 
based on its rank and generalization for non-binary alphabet are based on 
the same equality ([7]) ; a detailed analysis can be found in [6l [12] . 



6 A stegosystem for /c-order Markov sources of 
covert exts 

In this section we describe a stegosystem, which is an extension of the 
stegosystem described in the Section|l]to the case of /c-order Markov sources, 
where fc > 0. The main idea is the same; first, the given sequence of 
covertexts is divided into blocks, say, of length n > 2k. For each block 
X = {xi, . . . , Xn), Alice finds all sequences of covertexts of lengths n that 
have the same probability as x and also have the same k leading and k 
trailing symbols (the latter has to be done so that the probability of the 
sequence of blocks as a whole is intact). Then Alice enumerates all these 
sequences, and transmits the one whose number codes her hidden text. As 
before, to find the sequences that have the same probability as the given 
one, this probability itself does not have to be known. In fact, words that 
have the same number of occurrence of all subwords of length k + 1 have the 
same probability, for any fe-order Markov distribution. 
We now present a formal description. 

Definition 3. A source (of covertexts) fi is called (stationary) k-order 
Markov, if 

K^n+l = a\Xn = an, Xn-1 = On-l, ... ,Xi= ai) 

= K^k+l = a\xk = an,Xn-l = On-l, . . . ,Xi = On-k+l) 

for a// n € N and all a, oi, 02 . . . , € A. 
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Alice is given a sequence of covertexts x* = xi,X2-, ■ ■ ■ from an alphabet 
A generated by a /c-Markov source /i, where A: > is given. As in the i.i.d. 
case, the stegosystem depends on a parameter n — the block length, which 
we require to be greater than 2k. 

Let u denote the first n symbols of (the first block), let 

Uui^ai, . . . , flfc+i) be the number of occurrences of the subword ai, . . . , ak+i 
in u. Define the set Su as consisting of all words of length n in which the 
frequency of each subword of length A: + 1 is the same as in u, and for whose 
first and last k symbols are the same as in it: 

yseA''+'^ Uy{s) = i^u(s); yte {l,...,k,n-k + l,...,n} vt = ut}. (8) 

With the set Su so defined, the rest of the definition is the same as the 
definition of the stegosystem St^{A): in order to encode her secret text, 
Alice enumerates Su, and finds the element whose number corresponds to 
the secret text; the length of the block of hidden text is defined using an 
auxiliary random variable A, just as before. For the second and subsequent 
blocks the encoding is analogous. The decoding is also analogous. Denote 
the described stegosystem St^{A). 

The A;-order (conditional) Shannon entropy hm{^j) of a source /U is defined 
as follows: 

hmifJ') = ^ fiiv)"^ n{a\ v) log fiia\v). (9) 

v£A"^ aeA 

Theorem 2. Suppose that an unknown k-order Markov source fi generates 
a sequence of covertext taking values in some alphabet A, where k > is 
known. Let this source be used for encoding secret messages consisting of 
a sequence of i.i.d. equiprobable binary symbols using the described method 
St^{A) with n> 1. Then 

(i) the sequence of symbols output by the stegosystem obeys the same dis- 
tribution fj, as the input sequence, 

(a) If the alphabet A is finite then the average number of hidden symbols 
per letter L„ goes to the k-order Shannon entropy hi^{^) of the source 
H as n goes to infinity. 

Proof. To prove (i) observe that if, as before, xl,X2... denotes the se- 
quence generated by the source of covertexts, and Xi,X2,... the trans- 
mitted sequence, then by construction (cf. the proof of Theorem [1]) we have 
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P{Xi, . . . , Xn) = Pixl, ■ ■ ■ , X* ) where n is the length of the block. For the 
second block we have 

P{Xn+l, ■ ■ ■ ,X2n\Xi, . . . ,Xn) = . . . , X2n\Xn-k+l, ■ ■ ■ 7 Xn) 

= P{Xn+l, • • • , X2n\x^_)^_^_i, . . . , X„) = P[x^^^, . . . , X2n\Xi, . . . , 

where the first equality follows from the /c-Markov property, the second is 
by construction (the last k symbols of each block are kept intact), and the 
last one (as before) holds because the hidden texts are equiprobable, as are 
the elements of Su- The same holds for all the following blocks, thereby 
establishing the equality of distributions (i) . 

Let S'^ be the set of all strings of length n = \u\ that have the same 
fe-type as u, that is, the same frequencies of subwords of length k: S'^ = 
{v e A"- : Vs € A^"^^ i^v{s) = Vu{s)}- In other words, S'^ is the same 
as Su except the k first and last symbols are not fixed. Using a result 
of the theory of types [2], for any u for the size of the set S'^ we have 
logjS'^j = nhk{Pu) + o(n), where hk{Pu) is the k-ih. order entropy of the 
/c-order Markov distribution P„ defined by the empirical frequencies of the 
word u. Since the set Su is not more than a constant times smaller than S'u 
we also have log |5'u| = nhk{Pu) + o(n). Moreover, the law of large numbers 
implies that hk{u) hk{fJ.) for ^-almost every sequence u as its size n goes 
to infinity. Therefore, log [S'u! = nhk{ii) + o{n) with //-probability 1. It 
remains to observe that the expected number of hidden letters per block is 
lower bounded by log IS^I — 3. □ 

As was the case with St^{A), the only algorithmically nontrivial part of 
the stegosystem St^{A) is enumeration of all the sequences of the same type 
as a given one. Although it is clear that a polynomial-time algorithm for 
this problem can be found, based on the same ideas as outlined in Section [5l 
we cannot point to any source describing such an algorithm. 

7 Discussion 

We have proposed two stegosystems (with and without randomization) for 
which the output sequence of covertexts with hidden information is statisti- 
cally indistinguishable from a sequence of covertexts without hidden infor- 
mation. The proposed stegosystems are based on the assumption that the 
source of covertexts has either zero (Sections [3] and H]) or finite (Section [6]) 
memory. Even the former assumption is reasonable if we want to embed a 
secret message into a sequence of, say, graphical images, videos or texts of 



14 



a certain kind. If, for example, we want to use just one image to transmit 
(a large portion of) a secret text then our covertexts are parts of the image, 
which are clearly not i.i.d. How to extend the ideas developed in this work 
to the case of non-i.i.d. covertexts is perhaps the main open question. 

However, as it was mentioned in the introduction, the main ideas be- 
hind our stegosystems are not limited to the case of finite-memory sources. 
What we required is that for any given block of covertexts one can find 
a set of other blocks of the same probability, and such that each of these 
other blocks determines this set uniquely. Sequences of the same probabil- 
ity are called sequences of the same type. Sets of sequences of the same 
type can be identified for some distributions other than those with finite 
memory, for example for renewal processes (see [2] and references therein). 
Moreover, if there is one such set whose probability is close to 1 (as is the 
case, for a large block size, for finite-memory sources, or for aforementioned 
renewal processes) then the rate of transmission of hidden text will be close 
to the entropy of the source. It is challenging to find (high-probability) sets 
of equiprobable covertexts that would be relevant for those steganographic 
applications in which only a single object (such as a video or an image) is 
available for embedding the secret message. 

Another matter of interest from the algorithmic point of view is random- 
ization. The stegosystems proposed in Sections U] and [6] use randomization. 
The first observation here is that if, instead of using random numbers, Alice 
uses pseudorandom numbers which are indistinguishable from truly random 
in polynomial time, then it is easy to see that the same hardness property 
carries over to the security of the stegosystem: the distribution of covertexts 
with and without secret information will be indistinguishable in polynomial. 
In other words, the security of the stegosystem will be at least as good as 
that of the random number generator involved. Besides, the same concerns 
the assumption on the secret text. Namely, we have assumed that it is a 
sequence of independent observations of tosses of an unbiased coin. Without 
this assumption, we can say that, in general, it will be as hard to distinguish 
the sequence of covertexts with hidden information from that without, as 
it is to distinguish the secret text from i.i.d. unbiased coin tosses. Another 
observation is that perfectly secure stegosystems that do not use randomiza- 
tion can be constructed. For example, the stegosystem St2{A) of Section [3] 
does not use randomization. The stegosystems of Sections H] and [6] can be 
made non-randomized, sacrificing (at most) half of the rate of secret text 
transmission. Indeed, the need for randomization in these stegosystems 
arises from the fact that the size of the sets Su may not be a power of 2. A 
non-randomized version of the stegosystem can be obtained as follows. Alice 
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uses only a subset of that has as its size the largest power of 2 smaller 
than IS*,, I, to encode the secret message; this is done exactly as in St^^ for the 
case A = m. In case the actual block of covertexts observed does not belong 
to this set, she transmits this block unchanged, without coding any secret 
message. The rate of transmission of secret text for this non-randomized 
stegosystem is asymptotically lower bounded by /t(/x)/2 (instead of h{^)). 
However, such a stegosystem does not use random numbers. 
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